{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "adjusted-visit",
   "metadata": {
    "id": "continental-chosen"
   },
   "source": [
    "## Getting started"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "finished-property",
   "metadata": {
    "id": "according-personality"
   },
   "source": [
    "#### Standard imports and installations"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "surrounded-makeup",
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "xKDwaemyJF-R",
    "outputId": "3b8db4cf-40a3-4190-d97b-c181a6d858d5"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\u001b[K     |████████████████████████████████| 122kB 18.4MB/s \n",
      "\u001b[K     |████████████████████████████████| 1.8MB 37.4MB/s \n",
      "\u001b[K     |████████████████████████████████| 296kB 48.8MB/s \n",
      "\u001b[K     |████████████████████████████████| 337kB 55.1MB/s \n",
      "\u001b[K     |████████████████████████████████| 2.2MB 46.8MB/s \n",
      "\u001b[K     |████████████████████████████████| 133kB 59.6MB/s \n",
      "\u001b[K     |████████████████████████████████| 71kB 10.3MB/s \n",
      "\u001b[K     |████████████████████████████████| 102kB 15.3MB/s \n",
      "\u001b[K     |████████████████████████████████| 81kB 11.0MB/s \n",
      "\u001b[K     |████████████████████████████████| 133kB 56.7MB/s \n",
      "\u001b[K     |████████████████████████████████| 7.3MB 51.1MB/s \n",
      "\u001b[K     |████████████████████████████████| 92kB 13.0MB/s \n",
      "\u001b[K     |████████████████████████████████| 133kB 60.5MB/s \n",
      "\u001b[K     |████████████████████████████████| 3.2MB 45.3MB/s \n",
      "\u001b[K     |████████████████████████████████| 5.8MB 52.2MB/s \n",
      "\u001b[K     |████████████████████████████████| 71kB 10.6MB/s \n",
      "\u001b[K     |████████████████████████████████| 71kB 10.3MB/s \n",
      "\u001b[K     |████████████████████████████████| 51kB 8.8MB/s \n",
      "\u001b[?25h  Building wheel for outdated (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
      "  Building wheel for asciitree (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
      "  Building wheel for littleutils (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
      "\u001b[31mERROR: albumentations 0.1.12 has requirement imgaug<0.2.7,>=0.2.5, but you'll have imgaug 0.2.9 which is incompatible.\u001b[0m\n",
      "\u001b[31mERROR: botocore 1.20.17 has requirement urllib3<1.27,>=1.25.4, but you'll have urllib3 1.24.3 which is incompatible.\u001b[0m\n",
      "\u001b[31mERROR: boto3 1.16.39 has requirement botocore<1.20.0,>=1.19.39, but you'll have botocore 1.20.17 which is incompatible.\u001b[0m\n"
     ]
    }
   ],
   "source": [
    "!pip install hub -q"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "running-thesis",
   "metadata": {
    "id": "KB9q8bHps86T"
   },
   "outputs": [],
   "source": [
    "!hub login"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "defensive-smile",
   "metadata": {
    "id": "baking-authentication"
   },
   "outputs": [],
   "source": [
    "import os\n",
    "import pandas as pd\n",
    "import numpy as np\n",
    "from tqdm import tqdm\n",
    "import hub\n",
    "from hub.schema import Text, ClassLabel\n",
    "from sklearn.model_selection import train_test_split\n",
    "from sklearn.feature_extraction.text import TfidfTransformer, CountVectorizer"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "industrial-google",
   "metadata": {
    "id": "BApS3hkD9hkY"
   },
   "outputs": [],
   "source": [
    "!wget https://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz --quiet\n",
    "!tar -xf aclImdb_v1.tar.gz"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "funky-continuity",
   "metadata": {
    "id": "9J7u1Ysso-q8"
   },
   "source": [
    "Reading one sample review"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "peaceful-completion",
   "metadata": {
    "id": "pregnant-shoot"
   },
   "outputs": [],
   "source": [
    "filename = \"aclImdb/train/pos/0_9.txt\"\n",
    "with open(filename, \"r\") as fin:\n",
    "    line = fin.readline()\n",
    "fin.close()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "union-enhancement",
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 122
    },
    "id": "offensive-diameter",
    "outputId": "bdf04d56-c0b7-4a99-f6c0-dc502a4a046e"
   },
   "outputs": [
    {
     "data": {
      "application/vnd.google.colaboratory.intrinsic+json": {
       "type": "string"
      },
      "text/plain": [
       "'Bromwell High is a cartoon comedy. It ran at the same time as some other programs about school life, such as \"Teachers\". My 35 years in the teaching profession lead me to believe that Bromwell High\\'s satire is much closer to reality than is \"Teachers\". The scramble to survive financially, the insightful students who can see right through their pathetic teachers\\' pomp, the pettiness of the whole situation, all remind me of the schools I knew and their students. When I saw the episode in which a student repeatedly tried to burn down the school, I immediately recalled ......... at .......... High. A classic line: INSPECTOR: I\\'m here to sack one of your teachers. STUDENT: Welcome to Bromwell High. I expect that many adults of my age think that Bromwell High is far fetched. What a pity that it isn\\'t!'"
      ]
     },
     "execution_count": 12,
     "metadata": {
      "tags": []
     },
     "output_type": "execute_result"
    }
   ],
   "source": [
    "line"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "driving-reynolds",
   "metadata": {
    "id": "automated-welding"
   },
   "source": [
    "#### Collecting all filenames for processing"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "verbal-spray",
   "metadata": {
    "id": "premier-deficit"
   },
   "outputs": [],
   "source": [
    "file_names = []\n",
    "reviews_df = pd.DataFrame(columns=[\"Review\", \"Label\"])\n",
    "for root, dirs, files in os.walk(\"aclImdb/train/pos\"):\n",
    "    file_names.append(files)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "searching-array",
   "metadata": {
    "id": "civilian-wings"
   },
   "source": [
    "#### Appending all positive reviews to the DataFrame"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "rotary-homeless",
   "metadata": {
    "id": "special-anxiety"
   },
   "outputs": [],
   "source": [
    "root_dir = \"aclImdb/train/pos/\"\n",
    "count = 0\n",
    "for i in file_names[0]:\n",
    "    with open(root_dir + i, \"r\") as fin:\n",
    "        reviews_df = reviews_df.append(\n",
    "            {\"Review\": fin.readline(), \"Label\": 1}, ignore_index=True\n",
    "        )\n",
    "        count += 1\n",
    "fin.close()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "invalid-dylan",
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 424
    },
    "id": "statistical-negative",
    "outputId": "22bdc236-3142-4428-c88a-1a16d541ee68"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Review</th>\n",
       "      <th>Label</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>I have watched this episode more often than an...</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>I really enjoyed \"Doctor Mordrid\". This is a l...</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Hickory Dickory Dock was a good Poirot mystery...</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Fragile Carne, just before his great period. A...</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>So I don't ruin it for you, I'll be very brief...</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12495</th>\n",
       "      <td>Film can be a looking glass to see the world i...</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12496</th>\n",
       "      <td>A message movie, but a rather good one. Outsta...</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12497</th>\n",
       "      <td>Kurosawa weaves a tale that has a cast of char...</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12498</th>\n",
       "      <td>When you compare what Brian De Palma was doing...</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12499</th>\n",
       "      <td>This series is set a year after the mission to...</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>12500 rows × 2 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                                                  Review Label\n",
       "0      I have watched this episode more often than an...     1\n",
       "1      I really enjoyed \"Doctor Mordrid\". This is a l...     1\n",
       "2      Hickory Dickory Dock was a good Poirot mystery...     1\n",
       "3      Fragile Carne, just before his great period. A...     1\n",
       "4      So I don't ruin it for you, I'll be very brief...     1\n",
       "...                                                  ...   ...\n",
       "12495  Film can be a looking glass to see the world i...     1\n",
       "12496  A message movie, but a rather good one. Outsta...     1\n",
       "12497  Kurosawa weaves a tale that has a cast of char...     1\n",
       "12498  When you compare what Brian De Palma was doing...     1\n",
       "12499  This series is set a year after the mission to...     1\n",
       "\n",
       "[12500 rows x 2 columns]"
      ]
     },
     "execution_count": 16,
     "metadata": {
      "tags": []
     },
     "output_type": "execute_result"
    }
   ],
   "source": [
    "reviews_df"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "assured-uniform",
   "metadata": {
    "id": "running-exemption"
   },
   "source": [
    "#### Appending all negative reviews to the DataFrame"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "mighty-integral",
   "metadata": {
    "id": "alternative-necessity"
   },
   "outputs": [],
   "source": [
    "file_names = []\n",
    "for root, dirs, files in os.walk(\"aclImdb/train/neg\"):\n",
    "    file_names.append(files)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "artistic-powell",
   "metadata": {
    "id": "lovely-minute"
   },
   "outputs": [],
   "source": [
    "root_dir = \"aclImdb/train/neg/\"\n",
    "count = 0\n",
    "for i in file_names[0]:\n",
    "    with open(root_dir + i, \"r\") as fin:\n",
    "        reviews_df = reviews_df.append(\n",
    "            {\"Review\": fin.readline(), \"Label\": 0}, ignore_index=True\n",
    "        )\n",
    "        count += 1\n",
    "fin.close()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "challenging-qatar",
   "metadata": {
    "id": "complimentary-memphis"
   },
   "outputs": [],
   "source": [
    "max_length = 0\n",
    "for i in reviews_df[\"Review\"]:\n",
    "    if len(i) > max_length:\n",
    "        max_length = len(i)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "varied-subdivision",
   "metadata": {
    "id": "flexible-lexington"
   },
   "source": [
    "### Uploading the DataFrame to Hub"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "collaborative-determination",
   "metadata": {
    "id": "american-affiliate"
   },
   "outputs": [],
   "source": [
    "# Please run this cell only once. Once you have uploaded the dataset, you can simply fetch it by running\n",
    "# hub.Dataset(url)\n",
    "\n",
    "# Replace url with your username and dataset name. for example, if your name is Akash and your dataset is\n",
    "# FlipkartReviews, then\n",
    "# url = Akash/FlipkartReviews\n",
    "# Before you can upload datasets, please login into Hub. Run the first cell.\n",
    "\n",
    "url = \"dhiganthrao/IMDB-MovieReviews\"\n",
    "\n",
    "# Uncomment the following lines if you\"re uploading *this* dataset for the first time.\n",
    "# my_schema = {\"Review\": Text(shape=(None, ), max_shape=(max_length, )),\n",
    "#              \"Label\": ClassLabel(num_classes=2)}\n",
    "\n",
    "# ds = hub.Dataset(url, shape=(25000,), schema=my_schema)\n",
    "# for i in tqdm(range(len(ds))):\n",
    "#     ds[\"Review\", i] = reviews_df[\"Review\"][i]\n",
    "#     ds[\"Label\", i] = reviews_df[\"Labels\"][i]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "general-modern",
   "metadata": {
    "id": "3Jarp3GHtcis"
   },
   "outputs": [],
   "source": [
    "# Comment out the following line if you\"re uploading the dataset for the first time.\n",
    "ds = hub.Dataset(url)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "intermediate-lewis",
   "metadata": {
    "id": "wrong-finding"
   },
   "source": [
    "#### Flushing dataset to disk"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "spare-mailman",
   "metadata": {
    "id": "large-shanghai"
   },
   "outputs": [],
   "source": [
    "# If you\"ve gone ahead and uploaded your own dataset into Hub, run this command.\n",
    "# This command saves all changes to the cloud. You can also view this dataset at\n",
    "# https://app.activeloop.ai\n",
    "\n",
    "# ds.flush()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "earlier-museum",
   "metadata": {
    "id": "DMs9LyWwq6vw"
   },
   "source": [
    "## Fetching data from Hub"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "forward-spanking",
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "floral-sociology",
    "outputId": "91845f74-60d4-4791-f980-f88701a05be5"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'hub.api.dataset.Dataset'>\n",
      "SchemaDict({'Review': Text(shape=(None,), dtype='int64', max_shape=(13704,)), 'Label': ClassLabel(shape=(), dtype='int64', num_classes=2)})\n",
      "A simple and effective film about what life is all about, responding to challenges. It took a lot of gall for Homer and his friends to be able to grow into manhood without falling in the trap of a prefabricated future that runs from father to son, to be a miner in the local mine and never get out of that fate. It took also three different challenges for Homer and his friends to conquer a personal and free future. The challenge of the first ever man-made artificial satellite, Sputnik 1, a Soviet satellite, a milestone in human history, a turning point that Homer and his friends could not miss, did not want to miss. Then the challenge of science and applied mechanics to calculate and to devise a rocket from scratch or rather from what they could gather in books and order in their minds. Finally the challenge of a world that resists and refuses and tries to force you back into the pack, even with an untimely accident that forces you to get back into the pack for plain survival necessity, and even then Homer proved he had the guts to accept the challenge that was blocking for a while his own plans and dreams. But there is another side of the story that the film does not emphasize enough. Homer is the carrier of the project but he is also the carrier of the inspiration he and his friends need. If he is the one who is going to get the university scholarship, because his friends gave him precedence, his friends will also be able to get on their own roads and tracks and step out of the mining fate, thanks to the energy his inspiring example sets in front of their eyes. It is hard at times not to follow the example of the one who is like a beacon on a difficult road. But the film is also effective to show how the father resisted this dream because for him science was not the fabric of a true man, like mining or football. The working class fate that was so present in those 1950s and 1960s and still is present in some areas is too often enforced by the traditional thinking of the father. If the mother does not have the courage to speak up one day, the working class fate I am speaking of becomes a tremendous trap. Here too the film is effective and it should make some parents think. This might have been the fourth challenge Homer had to face: the challenge of taking a road that was not the one pointed at and programmed by his own father.<br /><br />Dr Jacques COULARDEAU, University Paris Dauphine & University Paris 1 Pantheon Sorbonne\n",
      "1\n"
     ]
    }
   ],
   "source": [
    "print(type(ds))\n",
    "print(ds.schema)\n",
    "\n",
    "print(ds[\"Review\", 4].compute())\n",
    "print(ds[\"Label\", 4].compute())"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cutting-catch",
   "metadata": {
    "id": "OKx156WSqGMT"
   },
   "source": [
    "## Training a model with our dataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "bearing-supplier",
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 35
    },
    "id": "CeopDtlKf4x0",
    "outputId": "942475d1-7385-4128-93a9-bd8edcdc4385"
   },
   "outputs": [
    {
     "data": {
      "application/vnd.google.colaboratory.intrinsic+json": {
       "type": "string"
      },
      "text/plain": [
       "'this is a test :) :('"
      ]
     },
     "execution_count": 10,
     "metadata": {
      "tags": []
     },
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import re\n",
    "\n",
    "\n",
    "def preprocessor(text):\n",
    "    text = re.sub(\"<[^>]*>\", \"\", text)\n",
    "    emoticons = re.findall(\"(?::|;|=)(?:-)?(?:\\)|\\(|D|P)\", text)\n",
    "    text = re.sub(\"[\\W]+\", \" \", text.lower()) + \" \".join(emoticons).replace(\"-\", \"\")\n",
    "    return text\n",
    "\n",
    "\n",
    "preprocessor(\"This is a :) test :-( !\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "daily-fighter",
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "u0I0Y0Aegjk3",
    "outputId": "ff00b66c-22e1-43d0-fa3a-5d0c5ceeeb29"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['I', 'find', 'it', 'fun', 'to', 'use', 'Hub']"
      ]
     },
     "execution_count": 11,
     "metadata": {
      "tags": []
     },
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from nltk.stem.porter import PorterStemmer\n",
    "\n",
    "porter = PorterStemmer()\n",
    "\n",
    "\n",
    "def tokenizer(text):\n",
    "    return text.split()\n",
    "\n",
    "\n",
    "tokenizer(\"I find it fun to use Hub\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "voluntary-colors",
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "ZcVnkyqLgqfz",
    "outputId": "ef664771-9239-4730-beb8-959cc91d3eee"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['hub', 'is', 'extrem', 'easi', 'and', 'effici', 'to', 'use']"
      ]
     },
     "execution_count": 12,
     "metadata": {
      "tags": []
     },
     "output_type": "execute_result"
    }
   ],
   "source": [
    "def tokenizer_stemmer(text):\n",
    "    return [porter.stem(word) for word in text.split()]\n",
    "\n",
    "\n",
    "tokenizer_stemmer(\"Hub is extremely easy and efficient to use\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "interpreted-canal",
   "metadata": {
    "id": "KjPjDYRpKhPe"
   },
   "outputs": [],
   "source": [
    "from sklearn.feature_extraction.text import TfidfVectorizer\n",
    "\n",
    "tfidf = TfidfVectorizer(\n",
    "    strip_accents=None,\n",
    "    lowercase=True,\n",
    "    preprocessor=preprocessor,\n",
    "    tokenizer=tokenizer_stemmer,\n",
    "    use_idf=True,\n",
    "    norm=\"l2\",\n",
    "    smooth_idf=True,\n",
    ")\n",
    "X = tfidf.fit_transform(\n",
    "    [item[\"Review\"].compute() for item in ds]\n",
    ")  # Our training dataset\n",
    "y = ds[\"Label\"].compute()  # Training Labels"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "greek-spanking",
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "Qdme7UlWiKng",
    "outputId": "83d64fa8-2533-4ead-a46c-2e308ead672f"
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "[Parallel(n_jobs=-1)]: Using backend LokyBackend with 2 concurrent workers.\n",
      "[Parallel(n_jobs=-1)]: Done   5 out of   5 | elapsed:   51.1s finished\n"
     ]
    }
   ],
   "source": [
    "from sklearn.model_selection import train_test_split\n",
    "from sklearn.linear_model import LogisticRegressionCV\n",
    "\n",
    "X_train, X_test, y_train, y_test = train_test_split(\n",
    "    X, y, random_state=1, test_size=0.5, shuffle=True\n",
    ")\n",
    "clf = LogisticRegressionCV(\n",
    "    cv=5, scoring=\"accuracy\", random_state=0, n_jobs=-1, verbose=3, max_iter=300\n",
    ").fit(X_train, y_train)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "express-begin",
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "z_7C3lCNnQmc",
    "outputId": "05b6d62c-b61f-43b7-b30e-bba1adf4df54"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Accuracy: 0.88648\n"
     ]
    }
   ],
   "source": [
    "print(f\"Accuracy: {clf.score(X_test, y_test)}\")"
   ]
  }
 ],
 "metadata": {
  "accelerator": "GPU",
  "colab": {
   "collapsed_sections": [
    "whole-confusion"
   ],
   "name": "Getting Started with Text on Hub.ipynb",
   "provenance": []
  },
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
